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better  understand  steganography  we  have  developed  simple  procedures  for  embedding. 
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broad  scope  and  strength  in  the  statistical  analysis.  A  second  programming  language 
used  was  C#  for  its  well  structured  user  interface  developer  tools. 


Steganography  Detection  Using  Entropy  Measures 

Technical  Report 

By  Eduardo  Melendez 

Universidad  Politecnica  de  Puerto  Rico 


November  16,  2012 


DoD  HBCU  #  W91 1 NF-1 1  -1  -01 74: 

"Enhancing  Research  in  Networking  &  System  Security,  and 

Forensics,  in  Puerto  Rico" 

Alfredo  Cruz,  PhD 

Jeff  Duffany,  PhD 
Eduardo  Melendez,  MS 


This  material  is  based  upon  work  supported  by,  or  in  part  by,  the  U.  S.  Army 
Research  Laboratory  and  the  U.  S.  Army  Research  Office  under  contract/grant 

number:  W911NF1110174. 


melendez .  eduardo  @  gmail.  com 


Contents 


I.  Introduction  4 

1.  Introduction  4 

2.  History  5 

2.1.  Steganography  concepts  .  5 

2.2.  Steganography  Vs  Cryptography .  6 

2.3.  Different  kinds  of  steganography .  6 

II.  Steganography  8 

3.  Images  and  Significance  of  Bits  8 

3.1.  Image  definition .  8 

3.2.  Image  Compression .  9 

3.3.  Least  Significant  Bit  (LSB) .  9 

3.4.  Significant  Bit  Image  Depiction  .  9 

4.  Steganography  Ad  Hoc  Methods  11 

4.1.  Linear  Ordered  Embedding .  11 

4.1.1.  Line-by-Line  Linear  Embedding .  11 

4.1.2.  Uniform  Spread  Embedding  (USE)  .  13 

4.1.3.  Pixel  ESBP .  14 

4.1.4.  ESB .  15 

4.2.  Diagonal .  16 

4.2.1.  Top-To-Bottom-Right-To-Eeft-Eeft  Comer .  16 

4.2.2.  Top-To-Bottom-Eeft-To-Right-Eeft  Comer .  17 

4.2.3.  Top-To-Bottom-Eeft-To-Right-Right  Comer .  17 

4.2.4.  Top-To-Bottom-Eeft-To-Right-Right  Comer .  18 

4.2.5.  Bottom-To-Top-Eeft-To-Right-Eeft  Comer .  18 

4.2.6.  Bottom-To-Top-Right-To-Eeft-Eeft  Comer .  19 

4.2.7.  Bottom-To-Top-Eeft-To-Right-Right  Comer .  19 

4.2.8.  Bottom-To-Top-Right-To-Eeft-Right  Comer .  20 

III.  Steganalysis  21 

5.  Discrete  Cosine  Transform  (DCT)  21 

5.1.  DCT  coefficients  in  general .  21 

5.2.  DCTs  of  a  8  X  8  block  matrix  .  21 

5.3.  DCT  coefficients  of  an  image .  22 

5.4.  Quantization  Index  Modulation  (QIM) .  23 


5.5.  Properties  of  an  ensemble  function .  23 

5.6.  Calculating  probabilities  for  quantized  values .  25 

5.7.  Effect  of  embedding .  26 

5.8.  Expected  Value  Estimate .  26 

6.  Entropy  27 

IV.  References  28 

7.  References  28 

V.  Appendix  28 

8.  Appendix  28 

9.  R-Codes  34 


Part  I. 

Introduction 

1.  Introduction 

Steganography  is  the  science  of  hiding  the  fact  that  some  communication  is  taking  place. 
In  general  encryption,  encoding  and  decoding  are  not  required  to  accomplish  steganog¬ 
raphy.  However,  encryption  serves  as  a  layer  of  protection  when  steganography  fails. 

The  first  objective  of  steganography  is  hiding  the  existence  of  data  exchange  between 
two  parties.  In  order  to  achieve  this  beyond  any  mere  manipulation  of  the  mean  or 
carrier,  the  existence  of  information  exchange  must  be  kept  away  from  the  reach  of 
human  radar  sensors.  Second,  steganography  must  cause  little  or  no  impact  on  the  car¬ 
riers  structure.  The  latter  guarantees  to  prevent  suspicious  of  some  sort  of  manipulation. 
Third,  carriers  must  outweigh  message  existence,  i.e.,  it  must  be  robust.  The  medium 
should  withstand  a  certain  level  of  modification  before  data  existence  is  detected.  Fi¬ 
nally,  the  capacity  of  the  medium  should  allows  to  handle  a  certain  level  of  information 
before  hidden  information  is  detected. 

Steganography  is  studied  taking  under  consideration  the  capabilities  of  detection  of  the 
information  transferred.  This  include  steganalysis,  the  techniques  and  methods  used  to 
detect  steganography.  The  importance  of  the  performance  of  steganography  will  lead, 
either  from  simple  to  more  complex  methods  of  detection.  The  importance  of  embed¬ 
ding  matches  the  efficiency  of  the  detection  technique. 

There  are  two  problems  in  steganalysis:  (1)  detecting  the  existence  of  a  hidden  message 
and  (2)  decoding  the  message.  As  terrorist  groups  have  been  known  to  use  steganogra¬ 
phy  in  planning  their  attacks,  this  has  become  an  important  problem  of  national  security. 
This  technical  report  is  only  concerned  first,  with  embedding  techniques  and  second,  the 
problem  of  hidden  message  detection  using  steganalysis. 

The  approach  is  to  statistically  analyse  the  least  significant  bit(s)  of  each  color  dimen¬ 
sion  of  each  pixel  to  look  for  some  kind  of  a  pattern.  In  the  absence  of  a  hidden  message 
this  should  look  like  random  noise.  Addition  of  a  hidden  message  will  affect  the  entropy 
of  the  data.  This  difference  should  be  detectable  by  comparing  the  entropy  of  unaltered 
picture  files  with  the  entropy  of  files  with  embedded  steganography.  Many  software 
are  available  in  the  market  for  steganography  and  steganalysis.  However,  in  order  to 
better  understand  steganography  we  have  developed  simple  procedures  for  embedding. 
The  programming  tools  used  for  this  purpose  are  mainly  the  R-Language  because  of  the 
broad  scope  and  strength  in  the  statistical  analysis.  A  second  programming  language 
used  was  C#  for  its  well  structured  user  interface  developer  tools. 

We  have  outlined  the  steps  to  proceed  with  the  development  of  our  research.  These 
are  follows: 


•  Obtain  sample  jpegs  from  the  Internet  or  other  source 

•  Import  these  sample  files  as  data  files  into  R  Language 

•  Statistically  analyse  least  significant  bits. 

•  Use  steganography  to  hide  messages  in  a  sample  of  jpeg  files. 

•  Import  as  a  data  file  into  R  language  and  statistically  analyse  the  least  significant 
bits  of  the  jpeg  files  with  known  hidden  messages. 

•  Compare  with  original  file  in  terms  of  entropy. 

2.  History 

There  are  many  instances  with  particular  purposes  where  steganography  has  been  used. 
In  particular  there  are  three  instances  that  are  mentioned  generally  mentioned. 

From  the  ancient  times,  the  Greek  historian  Herodotus  writes  of  a  nobleman,  Histaeus, 
who  needed  to  communicate  with  his  son-in-law  in  Greece.  He  shaved  the  head  of  one 
of  his  most  trusted  slaves  and  tattooed  the  message  onto  the  slave’s  scalp.  When  the 
slave’s  hair  grew  back  the  slave  was  dispatched  with  the  hidden  message. 

During  the  Second  World  War  the  Microdot  technique  was  developed  by  the  Germans. 
Information,  especially  photographs,  was  reduced  in  size  until  it  was  the  size  of  a  typed 
period.  Extremely  difficult  to  detect,  a  normal  cover  message  was  sent  over  an  insecure 
channel  with  one  of  the  periods  on  the  paper  containing  hidden  information. 

Today  steganography  is  mostly  used  on  computers  with  digital  data  being  the  carri¬ 
ers  and  networks  being  the  high  speed  delivery  channels. 

Definitions  are  important  to  understand  concepts  and  we  may  say  that  steganography 
is  the  art  of  hiding  the  fact  that  communication  is  taking  place,  by  hiding  information  in 
other  information  or  mean. 


2.1.  Steganography  concepts 

Although  steganography  is  an  ancient  subject  the  modern  formulation  is  being  focus 
in  two  ideas,  passive  and  active.  An  example  of  a  warden  who  has  knowledge  of  the 
communication  between  two  inmates.  This  view  was  proposed  by  Simmons,  where  two 
inmates  wish  to  communicate  in  secret  to  hatch  a  plan  to  escape  from  prison.  Their 
communication  passes  through  the  warden  who  will  throw  them  in  solitary  confinement 
should  she  suspect  any  convert  communication. 


The  warden,  who  is  free  to  examine  all  communication  exchanged  between  the  inmates, 
can  either  be  passive  or  active. 


•  passive:  A  passive  warden  simply  examines  the  communication  to  try  and  deter¬ 
mine  if  it  potentially  contains  secret  information.  If  she  suspects  a  some  com¬ 
munication  to  contain  hidden  information,  a  passive  warden  takes  note  of  the 
detected  covert  communication,  reports  this  to  some  outside  party  and  lets  the 
message  through  without  blocking  it. 

•  active:  An  active  warden,  on  the  other  hand,,  will  try  to  alter  the  communica¬ 
tion  with  the  suspected  hidden  information  deliberately,  in  order  to  remove  the 
information. 

Knowing  the  existence  of  some  communication  between  two  parties  allows  you  to  ei¬ 
ther,  alter  the  information  contained  and  passed  through  the  medium  or  just  let  it  pass 
through. 

2.2.  Steganography  Vs  Cryptography 

There  are  many  differences  between  steganography  and  cryptography.  None  of  them 
need  of  each  other  to  execute  its  purpose.  However,  both  do  coexist  and  furthermore, 
they  become  a  powerful  security  tool  when  used  together. 

Steganography  differs  from  cryptography  in  the  sense  that  where  cryptography  focuses 
on  keeping  the  contents  of  a  message  secret,  stagenography  focuses  on  keeping  the  ex¬ 
istence  of  a  message  secret. 

Steganography  and  cryptography  are  both  ways  to  protect  information  from  unwanted 
parties  but  neither  technology  alone  is  perfect  and  can  be  compromised. 

Once  the  presence  of  hidden  information  is  revealed  or  even  suspected,  the  purpose 
of  steganography  is  partly  defeated. 

The  strength  of  steganography  can  thus  be  amplified  by  combining  it  with  cryptography. 

Two  other  technologies  that  are  closely  related  to  steganography  are  watermarking  and 
fingerprinting.  These  technologies  are  mainly  concerned  with  the  protection  of  intellec¬ 
tual  property,  thus  the  algorithms  have  different  requirements  than  steganography. 

2.3.  Different  kinds  of  steganography 

There  are  four  types  of  categories  of  steganography.  Almost  all  digital  file  formats  can 
be  used  for  steganography,  but  the  formats  that  are  more  suitable  are  those  with  a  high 
degree  of  redundancy.  Redundancy  can  be  defined  as  the  bits  of  an  object  that  provide 
accuracy  far  greater  than  necessary  for  the  object’s  use  and  display.  The  redundant  bits 
of  an  object  are  those  bits  that  can  be  altered  without  the  alteration  being  detected  easily. 
Image  and  audio  files  especially  comply  with  this  requirement,  while  research  has  also 
uncovered  other  file  formats  that  can  be  used  for  information  hiding. 

•  Text:  An  obvious  method  was  to  hide  a  secret  message  in  evry  letter  of  every 
word  of  a  text  message. 


Images:  Given  the  large  amount  of  redundant  bits  present  in  the  digital  represen¬ 
tation  of  an  image,  images  are  the  most  popular  cover  objects  for  steganography. 

Audio  /  Video:  A  different  technique  unique  to  audio  steganography  is  masking, 
which  exploits  the  properties  of  the  human  ear  to  hide  inforamtion  unnoticeable. 
A  faint,  but  audible,  sound  becomes  inaudible  in  the  presence  of  another  louder 
audible  sound.  This  property  creates  a  channel  in  which  to  hide  information. 

Protocol:  The  term  protocol  steganography  refers  to  the  technique  of  embedding 
information  within  messages  and  network  control  protocols  used  in  network  trans¬ 
mission.  In  the  layers  of  the  OSI  network  model  there  exist  convert  channels 
where  steganography  can  be  used. 


Part  II. 

Steganography 

3.  Images  and  Significance  of  Bits 

The  main  object  of  Steganography  is  the  fact  that  communication  is  being  occurring 
with  out  attracting  attention.  Information  is  being  exchanged  hidden,  but  from  the  in¬ 
terested  parties,  i.e.  the  sender  and  the  receiver.  However,  once  the  communication  is 
been  compromised,  steganography  simply  fails. 

Information  is  exchanged  between  two  or  more  parties  through  a  communication  medium, 
These  ranges  from  text,  images,  videos  and  more  complex  ones.  In  our  research  the 
mean  of  communication  to  be  used  is  related  to  images,  in  particular  we  are  going  to 
use  jpeg  types.  The  reason  for  this  is  that  are  the  most  common  and  currently  used 
image  format.  Furthermore,  its  configuration  is  simple  to  manipulate. 

3.1.  Image  definition 

Images  are  very  useful  means  to  hide  messages  (embedding).  The  mechanism  of  em¬ 
bedding  is  accomplished  by  manipulating  certain  bits  in  the  binary  color  representation. 
A  monochrome  picture  is  depicted  in  different  scales  of  gray  including  black  and  white. 
Each  pixel  is  a  byte  (a  string  of  8  bits).  From  black  to  white  we  have  2®  =  256  dif¬ 
ferent  tones  of  gray.  These  range  from  white,  00000000,  through  black,  11111111.  A 
given  message  with  proper  size  can  be  embedded  in  a  cover  (image)  by  manipulating 
the  bits  on  each  pixel.  Let  us  assume  that  in  a  particular  pixel  the  gray  is  represented  by 
00001111.  By  switching  the  second  bit  we  obtain  a  new  binary  string,  01001111.  The 
latter  change  has  modified  the  original  picture. 

Colour  pictures  or  RGB  images  are  set  in  each  pixel  with  a  binary  string  representation 
of  length  24.  The  first  8  bits  represent  the  red  color,  green  by  the  next  8  bits  and  blue  by 
the  last  8  bits.  Each  pixel  ranges  in  colors  from  white  to  black,  including  combinations 
of  red,  green  and  blue  shades.  In  total,  there  are  256^  =  16,777,216  possibilities  of 
color  shades.  We  show  below  three  sets  of  8  bits-string.  Each  set  represents  a  color. 
Erom  an  original  image,  say 

00001010  00110101  00011110 

we  can  change  the  4*^  bit  from  left  to  right,  obtaining 

00011010  00100101  00001110 

The  latter  by  no  means  is  equivalent  to  steganography.  As  we  will  see  later  on  there  is 
a  big  difference  between  changing  the  least  significant  bit  (Isb)  and  changing  the  most 
significant  bit. 


3.2.  Image  Compression 

When  working  with  large  images,  we  start  having  problems  handling  large  files.  Some 
sort  of  compression  is  necessary  in  order  to  better  handle  these  images.  There  are  two 
types  of  compression:  lossy  and  lossless.  An  example  of  the  first  type  of  compression 
technique  is  JPEG  (Joint  Photographic  Experts  Group)  image  format.  Eor  the  second 
type,  we  have  the  GIE  (Graphical  Interchange  Eormat)  and  the  8-bit  BMP  (Microsoft 
Windows  Bitmap  file).  In  the  first  case  loss  of  information  takes  place,  while  in  the 
second  the  integrity  of  the  original  information  remains. 

The  process  of  jepg  compression  requires  the  calculation  of  discrete  cosine  transforms 
coefficients  and  a  quantization  matrix.  The  latter  leads  to  the  level  of  compression  of 
the  image. 

3.3.  Least  Significant  Bit  (LSB) 

The  object  of  steganography  is  to  prevent  suspicion  upon  the  action  of  communication, 
regardless  the  mean  being  used.  Small  changes  in  the  tone  of  gray  will  in  general,  be 
imperceptible  to  the  human  eye.  The  Eeast  Significant  Bit  (ESB)  is  a  simple  approach 
to  modify  an  image,  while  at  the  same  time  making  the  change  imperceptible  to  the 
human  eye.  By  considering  the  redundant  bits  (least  significant  bits),  imperceptible 
changes  take  place  by  changing  the  8*^  (from  left  to  right)  bit  in  the  string  of  eight  bits. 
Eor  example,  by  changing  00001 1 1 1  to  00001 1 10,  we  have  applied  the  least  significant 
bit  technique. 

3.4.  Significant  Bit  Image  Depiction 

Steganography  fails  to  comply  in  its  purpose  at  the  very  moment  when  the  existence 
of  the  message  has  been  compromised.  Even  when  steganography  is  not  infallible,  its 
strength  lies  entirely  on  the  non-knowledgeable  of  its  content,  whatever  it  is.  When  the 
mean  of  communication  is  a  picture  from  which  a  text  or  a  message  can  be  extracted, 
its  infallibility  is  directly  related  to  the  manipulation  of  the  pixels.  In  particular,  by 
manipulating  the  ESB,  any  message  is  safe  as  long  as  it  remains  imperceptible  to  the 
human  eye.  The  following  pictures  show  the  level  of  visual  perception  in  relation  to  the 
change  of  bits  in  each  pixel  for  each  channel  (color)  in  each  pixel.  An  image  is  com¬ 
pared  before  and  after  the  ESB  technique  has  been  applied.  The  simple  procedure  of 
switching  all  ESB  for  each  pixel  in  each  channel  produces  a  different  image  that  cannot 
be  distinguished  from  the  original  one.  Eigure  1  depicts  side  by  side  two  pictures.  The 
original  picture  from  the  left  has  been  altered  using  the  ESB  technique  resulting  in  the 
picture  from  the  right.  We  are  unable  to  perceive  any  changes  in  the  image  after  the 
ESB  technique. 

Eigure  2  shows  two  images.  Erom  the  left  is  the  resultant  image  after  switching  bit 
number  2  in  each  pixel  for  each  channel.  The  image  from  the  right  is  obtained  by 
switching  bit  number  3.  In  the  latter  image  a  gradual  change  in  color  is  being  noticed. 


Changes  made  from  the  bits  number  4  and  5  are  shown  in  Figure  3,  left  and  right, 
respectively.  The  manipulation  is  evident  in  both  pictures.  Note  that  colors  are  distorted 
and  degraded. 

Finally,  the  switch  of  bit  number  8  in  all  channels  for  every  pixel  makes  the  modifi¬ 
cation  evident.  So  it  is  that  it  can  be  perceived  the  by  human  eye.  These  bits  (number 
8)  are  extremely  significant,  and  if  steganography  is  the  intended  purpose,  would  be 
unwise  to  choose  bit  number  8.  Below,  we  show  the  original  image  side  by  side  with 
the  8^^-bit-switch  depiction. 

The  importance  in  using  the  LSB  is  addressed  to  preserve  the  objective  of  steganog¬ 
raphy,  i.e.,  to  hide  the  fact  that  communication  is  taking  place. 


Figure  1:  Original  Depiction  (left);  LSB  Depiction  (right) 


Figure  2:  Switch  of  bit  number  2  (left);  Switch  of  bit  number  3  (right) 


Figure  3:  Switch  of  bit  number  4  (left);  Switch  of  bit  number  5  (right) 


Figure  4:  Switch  of  bit  number  6  (left);  Switch  of  bit  number  7  (right) 


Figure  5:  Original  depiction  (left);  Switch  of  bit  number  8  (right) 


4.  Steganography  Ad  Hoc  Methods 

There  are  probably  a  finite  number  of  ways  of  embedding  a  message  in  an  image.  The 
embedding  is  possible  aeeording  to  the  length  of  the  message  and  the  dimensions  of  the 
image.  Beeause  seereey  is  the  objeetive  of  Steganography,  the  aet  of  oommunieating 
must  be  kept  hidden.  Below  we  will  diseuss  some  embedding  methods.  These  are  not 
the  most  effieient,  but  are  used  to  aeeomplish  one  of  the  task  of  Steganography,  embed¬ 
ding  a  message. 

We  may  eonsider  various  strategies  for  embedding.  The  latter  must  be  impereeptible 
to  human  eye.  The  teehnique  must  take  plaee  sequentially,  at  least  for  now.  In  other 
words,  there  is  no  randomness  (apparent)  in  the  way  a  message  is  embedded,  there  exist 
an  order.  One  general  method  is  to  embed  a  message  line  by  line.  For  this,  we  may  dis¬ 
euss  many  approaehes.  We  are  going  to  limit  ourselves  to  three  methods,  line-by-line, 
uniformly  distributed  and  analogously  to  the  LSB  method,  LSBP,  or  the  less  signifieant 
bits  per  pixel. 

4.1.  Linear  Ordered  Embedding 

4.1.1.  Line-by-Line  Linear  Embedding 

Probably  the  most  obvious  and  simple  method  of  embedding  is  line-by-line.  With  ap¬ 
propriate  message  length  and  image  dimensions,  this  strategy  will  be  aehieved.  The 
length  of  the  message  must  be  within  the  total  number  of  bytes  (8  bits)  eontained  by  the 
image  along  the  three  ehannels.  In  other  words,  if  I  is  the  length  of  the  message  and 
m  X  n  X  3  the  dimensions  of  the  image,  I  <  3mn  must  be  satisfied. 

The  general  teehnique  is  very  simple.  The  message  is  written  by  ehannels,  from  left 
to  right  and  from  top  to  bottom.  By  seleeting  a  ehannel,  say  k,  either  you  may  start  from 


the  most  top-left  position  of  that  color-matrix,  (1, 1,  k),  or  from  some  point  (i,  j,  k)  in 
that  matrix.  Continue  embedding  the  message  from  that  point  on,  from  left  to  right  and 
from  top  to  bottom. 


The  graph  below  shows  the  trace  of  the  embedded  message  when  this  is  written. 
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he  algorithm  starts  with  the  embedding  in  a  channel.  Remember  that  a  channel  is  pre¬ 
cisely  the  matrix  of  color’s  shade.  Let  us  assume  that  we  are  starting  at  the  top-left  most 
pixel.  In  a  monochrome  picture  we  are  dealing  with  different  shades  of  gray,  including 
white  and  black.  We  start  determining  if  the  length  of  the  message  do  not  exceed  the 
dimensions  of  the  matrix.  If  the  length  of  the  message  is  feasible,  in  the  sense  that  the 
length  fits  in  the  matrix  dimensions,  then  we  substitute  each  pixel  represented  by  8  bits, 
by  the  binary  representation  of  each  character  in  the  message.  We  continue  doing  these 
substitution  linearly,  until  the  end  of  the  message.  These  substitutions  take  place  one 
pixel  after  another,  skipping  none. 

Algorithm 

Let  M  a  matrix  of  dimension  r  x  c.  Let  s  =  {si,  S2, ...,  s;}  the  message  to  be  em¬ 
bedded  of  length  |s|  =  /,  i.e.  the  number  of  characters  in  the  sentence.  Let  /  =  k  mod  c 
with  0  <  k  <  c.  Let  /  =  ^  -f  1,  the  minimum  number  of  rows  needed  to  write  the 
message.  In  the  most  simple  case,  consider  an  image,  a  monochrome  type  of  dimension 
r  X  c.  Let  partition  our  message  s  =  S2, ...,  Sj}  in  segments  of  equal  length 

|s*|  =  c,  for  i  =  1, 2, ...,/  —  1,  with  the  exception  of  the  last  one  which  is  bounded, 

k/l  <  c. 

Given  the  matrix 


...,  The  last  segment  is  embed- 

are  those  of  Sj.  Let  us  label  the  latter 


h 


[tr  j 

The  rows  {fi, ...,  are  substituted  by 
ded  in  row  tf,  so  that  the  first  \s*j\  elements  oft/ 
row  by  t}.  Our  covert  matrix  is  then  given  by 


Mc  = 


f* 

tf+i 


An  algorithm  is  built  so  that  from  the  matrix  M,  one  obtain  the  matrix  Me-  The  al¬ 
gorithm,  even  when  is  simple,  it  is  inefficient.  The  embedding  is  accomplish  by  the 
substitution  of  a  character  code  (ascii  or  binary  representation)  into  the  binary  of  the 
pixel.  The  following  algorithm  has  been  built  for  monochrome  images  embedding. 


function(M,  s) 

{ 

for(i  in  1:/) 

{ 

if(  i  <  f) 

{ 

ti  =  s* 

} 

else 

{ 

} 

} 

return  M 

} 

4.1.2.  Uniform  Spread  Embedding  (USE) 

The  uniform  spread  of  a  message  along  the  matrix  M  is  another  form  of  linear  embed¬ 
ding.  The  message  is  spread  over  the  matrix  in  a  orderly  fashion.  Given  a  message  s 
and  a  matrix  M,  with  dimension  r  x  c  the  length  of  the  message  |s|  <  rc.  Let  us  take 
the  floor  k  =  [rc/|s|J.  Starting  from  the  most  top-left  pixel,  if  one  character  Sm  is  to 
be  substituted  at  pixel  =  1,2, ...  ,n,  the  next  character  will  be  embedded  at 

{(j)r{i,  j,  k),  (j)c{i,  j,  k)).  The  functions  and  (t)c  return  the  corresponding  indices  of  the 
pixel  from  which  embedding  is  going  take  place. 

Algorithm 

In  R  we  can  implement  the  algorithm  as  follow.  Consider  the  array  a  =  0102... a^, 
obtained  from  the  rows  of  the  matrix  M,  by  putting  the  rows  am  sequentially  from  top 
to  bottom  and  from  left  to  right.  Note  that  |a|  =  rc.  Starting  from  1,  we  are  going  to 
substitute  sequentially  a  character  of  the  message  s  in  a  every  k  steps.  A  new  matrix 


N  is  built  with  the  resulting  substitutions.  Using  the  R-language  we  may  accomplished 
our  goal  as  follows. 


function(M,  s) 

{ 

a  =  c(t(M)) 
k  =  floor(|a|/|s|) 
a[seq(l,|a|,  k)]  =  s 

N  =  matrix(a,nrow=r,ncol=c,byrow=TRUE) 
return  N 

} 


The  functions  (pr  and  (pc  provide  the  indices  for  the  next  embedding.  The  function  (pr  is 
outlined  as  follows. 

function(i,  j,  k) 

{ 

if(j  +  k  <=  c) 

{ 

return  i 

} 

else 

{ 

return  i  +  1 

} 

} 


For  the  function  (pc  we  have 

function(i,  j,  k) 

{ 

if(j  +  k  <=  c) 

{ 

return  j  +  k 

} 

else 

{ 

return  {j  +  k  —  l)%c+l 

} 


4.1.3.  Pixel  LSBP 

Another  technique  for  embedding  is  the  LSBP  (less  significant  bits  per  pixel).  This  take 
each  character  in  a  message  s,  and  spread  it  over  the  less  significant  bits  of  a  pixel.  In  the 


case  of  a  monochrome  image,  this  technique  becomes  the  line-by-line  linear  technique. 
However,  in  a  RGB  image  the  binary  representation  of  a  character  is  split  in  combina¬ 
tions  of  3,3,2  or  3,2,3  or  2,3,3  bits,  to  be  distributed  in  channels  Red,  Green  and  Blue, 
respectively. 

Consider  the  character  Si,  from  the  message  s.  Its  binary  representation  (xi,X2, xs) 
will  be  embedded  in  the  pixel  (v,w).  This  pixel  has  three  bytes 


00101011  01101111  11001000 

containing  each  color  shade.  By  substituting  our  character  and  using  configuration  3,3,2, 
we  obtain 


00101a;ia;2a;3  01101a;4a;5a;6  110010a;7a;8 


Algorithm 

Using  R  coding  we  make  the  embedding  per  pixel.  In  the  algorithm  below  the  pixel 
p  and  a  message  s  (in  a  binary  representation)  are  sent  for  the  embedding. 

function(p,  s) 

{ 

pr  <  —toBin{p[v,w,  1]) 
pg  <  —toBin{p[v,w,2]) 
pb  <  —toBin{p[v,w,3]) 
pr[6  :  8]  <  — s[l  :  3] 
pg[Q  :  8]  <  — s[4  :  6] 
pb[7  :  8]  <  -s[7  :  8] 
p[v,w,l]  <  —toInt{pr) 
p[v,w,2]  <  —toInt{pg) 
p[v,w,3]  <  —toInt{pb) 
return  p 

} 


The  latter  algorithm  is  not  complete,  of  course.  This  is  part  of  a  sequel  of  steps.  This 
could  be  used  with  a  modification  of  either  the  line-by-line  or  the  USE  techniques. 

4.1.4.  LSB 

All  the  techniques  shown  so  far  suffer  of  a  great  flaw,  they  make  steganography  to  fail. 
With  a  wide  and  large  monitor,  32-in  maybe  or  larger,  a  line  of  dots  may  be  seen  across 
the  lines.  These  dots  are  the  change  of  color  shades  caused  by  the  embedding.  The 
change  is  visible  and  evident,  violating  the  essence  of  the  purpose  of  steganography. 


In  order  to  prevent  this  flaw,  we  are  going  to  use  the  LSB  technique.  In  the  case  of 


the  monochrome  image,  the  binary  representation  of  a  character  is  distributed  along  8 
pixels.  The  following  character  is  spread  over  the  next  8  pixels,  and  so  on.  The  weak¬ 
ness  of  this  technique  is  that  the  length  of  the  message  cannot  be  considerable  large.  In 
this  case  we  have  space  constraints.  The  message  is  forced  not  to  exceed  |s|/3  <  3rc, 
or  better  put  |s|  <  3rc  in  a  RGB  picture,  and  |s|  <  rc  in  a  monochrome  image. 

Algorithm 

We  can  combine  a  number  of  techniques  that  lead  us  to  a  considerable  level  of  effi¬ 
ciency. 

function(M,  s) 

{ 

N  =  c(toBin(c(t(M)))) 
m  =  c(toBin{s)) 

#k  is  a  stepping  constant 
N[seq(S,length{N) ,  k)]  =  m 
N  =  matrix(7V,nrow=r,ncol=c,byrow=TRUE) 
return  N 

} 


The  above  algorithm  shows  the  technique  of  spreading  and  the  use  of  the  LSB. 

4.2.  Diagonal 

Different  from  linear  techniques,  our  embedding  can  be  traced  diagonally.  We  are  going 
to  mention  four  main  techniques  and  its  variants,  Top-To-Bottom-Right- To-Left,  Top- 
To-Bottom-Left-To-Right,  Bottom- To-Top-Right- To-Left,  Bottom-To-Top-Left- To-Right. 
In  total  there  are  different  cases. 

4.2.1.  Top-To-Bottom-Right-To-Left-Left  Corner 

We  can  start  our  embedding  according  to  the  following  sequence  of  indices  {(1,1),  (1,2), 
(2,1),  (1,3),  (2,2),  (3,1),...}.  The  following  diagram  shows  more  details. 


4.2.2.  Top-To-Bottom-Left-To-Right-Left  Corner 

We  can  start  our  embedding  according  to  the  following  sequence  of  indices  {(1,1),  (2,1), 
(1,2),  (3,1),  (2,2),  (1,3),...}.  The  following  diagram  shows  more  details. 


4.2.3.  Top-To-Bottom-Left-To-Right-Right  Corner 

We  can  start  our  embedding  according  to  the  following  sequence  of  indices  {(l,c),  (l,c- 
1),  (2,c),  (l,c-2),  (2,c-l),  (3,c),...).  The  following  diagram  shows  more  details. 


4.2.4.  Top-To-Bottom-Left-To-Right-Right  Corner 

We  can  start  our  embedding  according  to  the  following  sequence  of  indices  {(l,c),  (2,c), 
(l,c-l),  (3,c),  (2,c-l),  (l,c-2),...}.  The  following  diagram  shows  more  details. 


4.2.5.  Bottom-To-Top-Left-To-Right-Left  Corner 

We  can  start  our  embedding  according  to  the  following  sequence  of  indices  {(r,l),  (r- 
1,1),  (r,2),  (r-2,1),  (r-1,2),  (r,3),...}.  The  following  diagram  shows  more  details. 


4.2.6.  Bottom-To-Top-Right-To-Left-Left  Corner 

We  can  start  our  embedding  according  to  the  following  sequence  of  indices  {(r,l),  (r,3), 
(r,2),  (r-1,1),  (r-1,2),  (r-2,1),...}.  The  following  diagram  shows  more  details. 


4.2.7.  Bottom-To-Top-Left-To-Right-Right  Corner 

We  can  start  our  embedding  according  to  the  following  sequence  of  indices  {(r,c),  (r,c- 
1),  (r-l,c),  (r,c-2),  (r-l,c-l),  (r-2,c),...}.  The  following  diagram  shows  more  details. 


4.2.8.  Bottom-To-Top-Right-To-Left-Right  Corner 

We  can  start  our  embedding  according  to  the  following  sequence  of  indices  {(r,c),  (r- 
l,c),  (r,c-l),  (r-2,c),  (r-l,c-l),  (r,c-2),...}.  The  following  diagram  shows  more  details. 


Part  III. 
Steganalysis 


5.  Discrete  Cosine  Transform  (DCT) 


5.1.  DCT  coefficients  in  general 

The  Discrete  Cosine  Transform  (DCT)  is  used  to  transform  values  from  successive  8x8 
pixels  from  the  image  (this  being  set  to  appropiate  values)  to  a  block  of  DCT  coefficients 
of  same  dimensions.  The  DCT  coefficients  can  be  obtained  using  a  Type  II  DCT  table 
given  by  the  equation  below  for  an  integer  N 


N-lN-l 

-^c{i)c{j)  Y  m{x,y)dN{x,i)dN{y,j) 

X=Q  y=0 


where 


disfia,  b) 
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(2a  +  l)h'K 


and  C{n) 
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\  1  i/  n  >  0.  / 
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(2) 


and  m{x,  y)  are  the  entries  from  a  matrix  m.  Type  II  is  the  most  frequently  used  in  many 
applications. 


5.2.  DCTs  of  a  8  X  8  block  matrix 

In  the  particular  case  when  N  =  8,  for  a  block  8x8,  the  equation  is  reduced  to 

1  ^  ^ 

D(,.f)  =  -c(i)cu)YY  m{x,y)dN{x,i)dN{y,j) 

x=0  y=0 


with 


d^ia,  b)  =  cos 


(2a  +  l)67r 
16 


and  C{n)  =  | 


The  matrix  T  obtained  from  equation  (3)  is  shown  below. 


if  n  =  0  1 
if  n  >  O.j 


(3) 

(4) 


[.11  [,21 

[1,3  0.35355339  0.35355339 

[2,1  0.49039264  0.41573481 

[3,1  0.46193977  0.19134172 

[4,1  0.41573481  -0.09754516 
[5,1  0.35355339  -0.35355339 
[6,1  0.27778512  -0.49039264 
[7,3  0.19134172  -0.46193977 
[8,3  0.09754516  -0.27778512 


[,31  [,4] 

0.35355339  0.35355339 

0 . 27778512  0 . 09754516 

-0.19134172  -0.46193977 
-0.49039264  -0.27778512 
-0.35355339  0.35355339 

0.09754516  0.41573481 

0.46193977  -0.19134172 
0.41573481  -0.49039264 


[,51  [,6] 

0 . 35355339  0 . 35355339 

-0.09754516  -0.27778512 
-0.46193977  -0.19134172 
0.27778512  0.49039264 

0.35355339  -0.35355339 
-0.41573481  -0.09754516 
-0.19134172  0.46193977 

0.49039264  -0.41573481 


[,7]  [,8] 

0.35355339  0.35355339 

-0 . 41573481  -0 . 49039264 

0 . 19134172  0 . 46193977 

0.09754516  -0.41573481 

-0.35355339  0.35355339 

0.49039264  -0.27778512 

-0.46193977  0.19134172 

0.27778512  -0.09754516 


This  matrix  is  an  orthogonal  matrix,  i.e.,  T  ^  =  T'  or  TT'  =  T'T  =  I,  where  /  is  the 
identity  matrix. 

The  implementation  of  the  equation  above  for  a  given  image  (gray  scale),  M,  by  taking 
a  8  X  8  block  M^y  the  DCT  coefficients  matrix  (for  a  block)  is  given  by 

D^y  =  TM^yT' 

A  block  of  a  given  dimension  (8  x  8)  contains  values  from  ranging  from  0  to  255.  Since 
DCT  domain  runs  over  symmetric  values  above  the  origin,  we  level  off  our  block  by 
128.  The  latter  provides  the  symmetry  required  upon  the  images’  values. 

These  values  are  called  the  un-quantized  DCT  coefficients  and  our  matrix  is  in  the  DCT 
domain. 


5.3.  DCT  coefficients  of  an  image 


In  general  we  can  obtain  DCT  coefficients  for  any  clock  of  any  dimension.  One  as¬ 
pect  to  achieve  the  DCT  coefficients  matrix  over  the  whole  image  is  by  partitioning 
the  image  in  equal  dimensional  blocks.  The  latter  task  results  in  many  loops  and  time 
consuming.  We  can  construct  block  diagonal  matrices  that  will  lead  us  to  the  DCT  co¬ 
efficients  image.  Two  matrices  will  be  used,  one  for  the  left  T;  and  one  for  the  right 
multiplications.  The  latter  are  given  by 
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and  r  = 
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(5) 


Ti  and  are  block  diagonal  matrices  which  diagonal  region  are  filled  with  the  matrix 
T.  There  are  precisely  /  and  r  of  such  block  matrix  T,  in  Ti  and  Tr,  respectively. 


Let  us  assume  that  the  image  M  has  appropriate  values,  i.e.,  level  off  from  [0,255], 
with  dimensions  multiple  of  8.  The  DCT  coefficients  matrix  of  the  image 

D  =  TiMTl,  (6) 

results  in  consecutive  DCT  coefficients  blocks  of  dimension  8x8.  The  matrix  D  and  M 
are  of  the  form 


'Du 

Di2  ■ 

.  .  Dir 

Mil 

Mi2  .  . 

.  .  Mir 

D  = 

D21 

D22  ■ 

.  .  D2r 

and  M  = 

M21 

M22  ■  ■ 

M2r 

Da 

Di2  ■ 

■  ■  Dir 

Mil 

Mi2  ■  ■ 

■  ■  Mir 

=  TM^yT',  D  =  {D^y)^y  and  M  =  {M^y)^y. 


where  D 


5.4.  Quantization  Index  Modulation  (QIM) 

Basic  principles 

Consider  the  case  of  embedding  a  message  m  in  a  host  signal  x  G  3?^.  This  host 
signal  can  be  a  vector  of  pixel  values  of  Discrete  Cosine  Transform  coefficients  from 
an  image.  We  wish  to  embed  at  a  rate  of  Rm  bits  per  dimension  (bits  per  host  signal 
sample)  so  we  can  think  of  m  as  an  integer,  where 

m  e  {1,2,...,  2^^™}. 

An  embedding  function  s{x,m),  maps  the  host  signal  x  and  the  information  to  embed 
m,  to  a  composite  signal  s  G  3?^.  The  embedding  should  not  degrade  the  host  signal, 
so  that  we  have  a  minimum  level  or  some  distortion  measure  D{s,  x)  between  the  com¬ 
posite  and  host  signals. 

There  are  two  main  or  general  classes  to  distinguish  embedding  methods  from.  First 
we  have  the  host-interference  nonrejecting  methods  and  second,  the  host-interference 
rejecting  methods. 

Host-interference  nonrejecting  methods  have  the  general  property  that  the  host  signal 
is  effectively  a  source  of  interference  in  a  system.  The  simplest  of  such  methods  have 
purely  additive  embedding  functions  of  the  form 

s{x,  m)  =  X  +  w{m) 

where  w{m)  is  typically  a  pseudo-noise  sequence,  often  called  as  additive  spread- 
spectrum  method.  It  is  very  common  to  express  w{m)  =  a{m)v,  where  w  is  a  unit- 
energy  spreading  vector  and  a{m)  is  a  scalar  function  of  the  message. 

5.5.  Properties  of  an  ensemble  function 

The  embedding  should  not  degrade  the  host  signal  up  to  a  point  that  the  embedding  can 
be  capture.  Therefore,  a  distortion  measure  D{s,x)  between  the  composite  and  host 
signals  should  be  relatively  small.  One  might  measure,  in  theory  the  difference  between 
the  host  signal  and  the  composite. 

For  example,  one  might  consider  the  square-error  distortion  measure 

D{s,x)  =  ^\\s  -  xW^.  (8) 

Even  more,  we  can  express  the  composite  function  as  the  sum  of  the  host  signal  and 
nuisance,  i.e., 

s{x,m)  =  X  +  e{x,m).  (9) 


Nuisance  might  be  expressed  as 


e{x,  m)  =  s{x,  m)  —  X 


(10) 


Now,  to  develop  the  QIM  concept  the  mapping  s{x,  m)  can  be  viewed  as  an  ensemble 
of  functions  of  x,  indexed  by  m.  If  the  strength  of  the  embedding  function  is  to  cause 
small  distortion,  then  we  may  state  the  identity- approximate  property 

s{x,  m)  ~  x,  Vm. 

The  fact  that  the  system  needs  to  be  robust  to  perturbations  suggests  that  the  points  in 
the  range  of  one  function  in  the  ensemble  should  be  far  away  in  some  sense  from  the 
points  in  the  range  of  any  other  function. 

Now,  in  reality,  we  don’t  know  much  about  the  original  information,  but  the  mapping  s, 
i.e.,  either  few  or  no  information  regarding  the  source  values  x  and  m. 

The  ideal  would  be  the  function  ranges  to  have  no  common  ground  or  intersection. 
Otherwise,  some  values  of  s  will  lead  to  some  undetermined  m.  Precisely,  the  non¬ 
intersection  property  leads  to  host-signal  interference  rejection. 

The  non- intersection  property  along  with  the  approximate-identity  property,  which  sug¬ 
gests  that  the  ranges  of  each  of  the  functions  "cover"  the  space  of  possible  (or  at  least 
highly  probable)  host-signal  values  x,  suggests  that  the  functions  be  discontinuous. 

Quantizers  are  just  such  a  class  of  discontinuous,  approximate-identity  functions.  QIM 
refers  to  embedding  information  by  first  modulating  an  index  or  sequence  of  indices 
with  the  embedded  information  and  then  quantizing  the  host  signal  with  the  associated 
quantizer  or  sequence  of  quantizers. 

These  quantizers  are  analysed  from  the  probabilistic  scope.  For  the  following  slides 
we  will  develop  the  probabilistic  framework.  The  latter  will  allows  us  to  define  tests 
and  non-parametric  structures  for  stego  image  identification. 

Let  us  define  the  probability  framework.  First,  let  x  the  random  space  of  possible  values 
for  the  the  host  signal  x  or  cover.  Let  us  define  the  random  variable  X  defined  over  a 
random  space  x>  with  pmf  Px-  Given  that  X  is  a  random  vector,  let  us  assume  it  is  i.i.d.. 
Let  us  define  the  Xg  quantized  image  obtained  by  plain  quantization  (image  with  no  hid¬ 
den  message).  Furthermore,  let  us  define  the  QIM-stego,  xqim,  which  is  the  quantized 
image  with  a  hidden  message.  Given  the  cover  x  let  us  also  define  the  DM-stego,  xdm> 
stego  using  the  DM  (dither  modulation).  The  design  of  a  parametric  hypothesis  test 
requires  the  probability  mass  function  of  x,  Xq,  xqim  and  xdm-  We  are  going  to  define 
all  probability  mass  functions  in  terms  of  the  pmf  of  x. 

Now,  in  the  case  of  the  plain  quantization  let  us  define  the  quantizer  output,  Xk  as 


Xk  =  kA*,  A*  >  0. 


(11) 


Consider  the  following  sample  space.  The  fact  that  it  is  a  sample  space  is  to  be  estab¬ 
lished. 

Aa*  =  {xk  '■  Xk  =  /cA*,  k  G  Z}.  (12) 

Here  Z  is  the  set  of  integers. 


5.6.  Calculating  probabilities  for  quantized  values 

Let  us  define  the  random  variable  A  with  pmf,  Px,  and  range  equals  to  Aa*.  Let  us 
define  the  quantization  set 

x(a,A*)  A  [a- A72,a  + A72),  a  G  Aa*.  (13) 


The  probability 


with  indicator  function 


-Pxg(®)  —  y  ]  ^x(x)Sx(aA*)(x) 
a;ex 


^xAA*)  (^) 


fl  if  xex{a,^*), 
\0  if  x^x{a,^*)- 


(14) 


Let  us  now  define  a  new  choice  of  quantizers  to  hide  binary  data,  =  {0, 1}^.  Here 
N  is  regarded  as  the  length  of  a  string  of  information  (a  message).  We  segregate  the 
original  quantizer  into  2  ordinary  subsets,  each  with  step-size  A  =  2A*.  Let  us  define 
the  set 

x(s,A)  =  [s-A/2,s  +  A/2)  (15) 

Two  association  can  be  made.  One  quantizer  associated  with  routing  1  is  identical  to 
that  as  for  0,  but  shifted  by  A/2.  Assuming  the  probability  of  0  is  equal  to  that  of  1,  we 
have 

P^QiMi-s)  =  PAx)Sxis,A)i^)  (16) 

where  7(«,a)  1®  1^®  indicator  function. 


For  dither  modulation,  we  let  P  be  a  pseudo  random  variable  uniformly  distributed 
over  [—A/4,  A/4)  so  that  the  output  will  cover  all  the  values  of  the  input,  and  will  not 
leave  tell-tale  signs  of  quantization.  In  this  range. 


Py^DM  (^) 


2(6  —  a) 
A 


2e 

A’ 


e 


b  —  a, 


(17) 


and  a,  6  G  [— A/4,  A/4].  With  this  dithering,  any  xdm  is  valid,  subject  to  the  granular¬ 
ity  of  the  system.  For  every  received  xdm  there  is  one  and  only  only  one  valid  value  of 
d  that  could  have  made  that  value  of  xdm- 


For  any  valid  xdm. 


-^xdm  {^Dm) 


P{{B  =  0, 1}  and  x{^dm,  A)  and  {d  required}) 
P{{B  =  0,  l})P^ixixDM,  A))PDi{d  required}) 
IP^{x{xdm,A))^ 

A)) 


5.7.  Effect  of  embedding 

Let  us  define  rii  and  rii*  as  the  frequency  of  color  indices  before  and  after  embedding 
respectively.  The  relation  below  holds 

\n2i  -  n2i+i\  >  \n*2i  -  (18) 


which  means  that  the  difference  between  adjacent  frequency  color  values  is  reduced  by 
the  embedding  process.  In  general  and  clearly  we  can  state  that  for  n2i  >  n2i+i  the  bits 
of  the  hidden  message  change  n2i  to  n2i+i  more  often  than  the  other  way  around. 

5.8.  Expected  Value  Estimate 

Instead  of  using  color  frequencies,  we  are  going  to  use  the  DCT  coefficients.  The  dis¬ 
tortion  will  be  measured  by  the  use  of  the  x^-test.  Let  n*  be  the  frequency  of  DCT 
coefficient  i  in  the  image.  Because  the  test  will  uses  the  stego-image,  the  expected  dis¬ 
tribution  y*  for  the  test  has  to  be  computed  from  the  image.  As  a  result  the  arithmetic 
mean  is  to  be  used,  i.e. 


n2i  +  n2i+i 

Vi  =  - - 


(19) 


to  determine  the  empiric  distribution.  This  average  is  an  estimate  of  the  expected  value. 
The  expected  distribution  is  compared  against  the  observed  distribution  yi  =  n2i. 


The  statistic  obtained  has  a  x^  distribution  and  is  defined  by 


,2 


(20) 


where  v  are  the  degrees  of  freedom.  Large  values  of  suggest  a  nonrandom  condition 
or  a  low  level  of  randomness.  As  a  consequence  the  source  is  probably  an  original  one. 
The  opposite,  small  values  indicate  a  high  degree  of  randomness,  which  is  often  con¬ 
nected  to  encrypted  hidden  information. 


The  probability  p  of  embedding  is  given  by 

p  =  Pr{X>x"),  (21) 

where  X  ~  We  can  compute  this  probability  for  particular  regions  in  the  image. 


Now,  let  us  partition  the  image  in  K  regions  with  appropriate  dimensions  there  are  many 
approaches  to  take  from  here.  We  can  explore  the  sample  mean 


K  ’ 

with  the  assumption  that  xl  ~  xli  independent  random  variables  with  mean  Vi  and 
variance  2vi.  Note  that  X  ~  iV(/i,  cr^),  where 


and 


Ef=i 


and  that  K  is  relatively  large  to  induce  normality.  A  special  case  can  considered  when 
xf  are  iid. 


This  new  consideration  defines  a  particular  framework 


p  =  V  and 


2v 

~K 


6.  Entropy 


Let  X  a  random  variable  with  a  set  possible  outcomes  Ax  =  {ai, . . . ,  a/},  with  corre¬ 
sponding  probabilities  Px  =  {pi, . . .  ,p/}.  The  entropy  of  X  is  defined  by 


En{X)  =  ^  —  Pr(a;)  log  Pr(a;), 
xeAx 


with  the  convention  for 

Pr{x)  =  0,  0  X  log  1/0  =  0 

since  lim5i_^o+  6*  log  1/6*  =  0. 


Example  1 

In  an  image  with  uniform  distribution  of  gray-level  intensity,  let  us  define  Ax  =  {0, . . . ,  255}. 
The  corresponding  probability  (uniform)  is  given  by  p*  =  1/256  for  i  =  1, . . . ,  256. 
Therefore,  (by  substituting  log  by  log2),  the  entropy  is  given  by 


En{X)  =  j:Z\Pr{x,)\og,{l/Prix,)) 
=  256  X  1/256  X  log2(256) 

=  81og2(2) 

=  8. 


Example  2 

Analogously,  in  a  RGB  image,  we  can  show  that  the  entropy  is  24. 
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8.  Appendix 

Consider  closed  n— dimensional  intervals  /  =  {x  ;  aj  <  Xj  <  bj,j  =  1, . . .  ,n}  and 
their  volumes  v{I)  =  —  a j).  To  define  the  outer  measure  of  an  arbitrary  subset 

E  of  i?”,  cover  Ehy  a  countable  collection  S  of  intervals  Ik,  and  let 

a{S)  =  J2<h).  (22) 

Ikes 


Definition  3  (Lebesgue  outer  measure  (or  exterior  measure))  The  Lebesgue  outer  mea¬ 
sure  of  E,  denoted  \E\e,  is  defined  by 

\E\,  =  mia{S),  (23) 

where  the  infimum  is  taken  over  all  such  covers  S  of  E.  Thus,  0  <  \E\e  <  +oo. 

Definition  4  (Lebesgue  Measurable  or  Measurable)  A  subset  E  c  is  said  to  be 
Lebesgue  measurable,  or  simply  measurable,  if  given  e  >  0,  there  exists  an  open  set 
G  such  that 

E  C  G  and  |G  —  E\e  <  e.  (24) 

Let  /  be  a  real-valued  function  defined  on  a  set  E  in  i?”,  that  is  — cx)  <  /(x)  <  +cx), 

X  G  -E.  then  /  is  called  a  Lebesgue  measurable  function  on  E,  or  simply  a  measurable 
function,  if  fro  every  finite  a,  the  set 

{x  G  E  :  /(x)  >  a} 

is  a  measurable  subset  of  i?”.  We  may  use  the  notation  {/  >  a}  for  {x  G  -E  :  /(x)  > 
a}. 


Definition  5  (Banach  Space)  A  set  X  is  called  a  Banach  space  over  the  complex  num¬ 
bers  if  it  satisfies  the  following  conditions: 

1.  X  is  a  linear  space  over  the  complex  numbers  ‘A;  that  is ,  if  x,y  G  X  and  a  G  ^3, 
then  X  -h  y  f  X  and  ax  G  X. 

2.  X  is  a  normed  space;  that  is,  for  every  x  E  X  there  is  a  non-negative  number 
1 1  a;  1 1  such  that 

a)  I  |a;|  I  =  0  if  and  only  if  x  is  the  zero  element  in  X. 


c)  \\x  +  y\\  <  ||a;||  +  ||?/||. 

If  this  conditions  are  fulfilled,  |  |a;|  |  is  called  the  norm  of  x. 

3.  X  is  complete  with  respect  to  its  norm;  that  is,  every  Cauchy  sequence  in  X 
converges  in  X,  orif\\xk  —  Xm\  \  0  as  k,m  ^  oo,  then  there  is  an  x  E  X  such 

that  I  —  a;|  I  — )■  0. 

If  E  is  a  measurable  subset  of  i?”  and  p  satisfies  0  <  p  <  oo,  then  L^{E)  denotes  the 
collection  of  measurable  functions  /  for  which  J^\ff  is  finite,  that  is 

LP{E)  =  !^f:jjff<+ooY  0<p<oo.  (25) 

So  from  here  we  can  mention  and  classes.  L  may  be  used  instead  of  L^. 

Definition  6  (Linear  Functional)  If  B  is  a  Banach  space  ( or  more  generally,  a  normed 
linear  space)  over  the  real  numbers,  a  real-valued  linear  functional  I  on  B  is  by  defini¬ 
tion  a  real  valued  function  l{f),  f  E  B,  which  satisfies,  linearity,  i.e., 

/(/i +  /2)  = /(/i) +  /(/2),  and  l{af)  =  al{f),  -oo  <  a  <  oo. 

Integration  is  an  example  of  a  linear  functional: 

I{x)  =  f  x{t)dt, 

J  a 

where  x{f)  is  an  integrable  function  defined  on  the  interval  (a,  h).  We  can  defined  linear 
functional  by  using  different  kernels,  which  determine  transforms  properties.  Integral 
transforms  are  often  used  for  the  reduction  of  complexity  of  mathematical  problems. 

One  of  the  most  known  and  used  linear  functional  transformations  are  the  Fourier 
Transform. 

Definition  7  (Fourier  Transform)  For  f  E  L{R),  define  the  Fourier  transform  f  of  f 
by 

/OO 

/(t)e"**"'df,  {x  E  R).  (26) 

■OO 

Definition  8  (Inner  product)  For  f,gE  L"^,  the  inner  product  of  f  and  g  is  defined  by 

{f,9)  =  Jf9,  (27) 

where  g  denotes  the  complex  conjugate  of  g.  The  norm  of  f  is  given  by  1 1/|  P  =  (/,  /). 

Definition  9  (Orthogonal  and  orthonormal)  If  (/,  g)  =  0,  f  and  g  are  said  to  be 
orthogonal.  A  set  {fofaoA  is  orthogonal  if  any  two  of  its  elements  are  orthogonal. 
Furthermore,  {fofa&A  is  orthonormal  if  it  is  orthogonal  and  1 10|  |  =  1. 


A  particular  case  of  the  Fourier  transform  is  the  following 


F{x)  =  f{27rx)  =  /  /(t)e-“df.  (28) 

J  — OO 

where  the  kernel  is  given  by,  k{x,t)  =  and  27rx  is  the  angular  frequency. 

Early  in  the  1800s  a  French  mathematician  Joseph  Fourier  introduced  what  was  called 
the  Fourier  series.  Given  any  orthonormal  system  {cpk}  for  If  /  G  the  numbers 
defined  by 

Ck  =  {fAk)  (29) 

are  called  the  Fourier  coefficients  of  /  with  respect  to  {(j)k}-  The  series  is 

called  the  Fourier  series  of  /  with  respect  to  {0^}  and  is  denoted  by  S[f]. 

A  system  of  complex-valued  functions  {(j)a{x)},  all  in  L‘^{E),  is  called  orthogonal  over 
Eif 

Note  that  (0^,  0/?)  =  1  for  all  a,  the  orthogonal  system  is  called  normal,  or  orthonormal. 
If  {0a}  is  orthogonal,  the  system 


{0a/||0a||2} 


is  orthonormal. 


0, 

Afc  >  0, 


I, 

k  =  l. 


(31) 


Given  any  complex-valued  /  G  L‘^{E),  we  call  the  numbers 


f4> 


(32) 


the  Fourier  coefficients. 

We  shall  now  consider  a  special  orthogonal  system,  the  trigonometric  system.  This 
name  is  given  to  the  system  of  functions 

=  cos /ex -f  i  sin /ex,  /e  =  0,  ±1,  ±2, . . . .  (33) 


Note  furthermore,  that 


^ikx  _|_  ^—ikx  ^ikx 


Akx 


2i 


A;  =  l,2,... 


are  orthonormal  over  any  interval  Q  of  length  27r,  or  what  is  the  same  thing,  the  functions 


cosx,  sin X, ... ,  cos  /ex,  sin  /ex, . . . , 


are  orthogonal  over  any  interval  Q  of  length  Itt. 


Note  that  for  /  G  L{Q)  can  be  developed  into  a  new  Fourier  series 


/  -ao  +  cos  kx  +  bk  sin  kx), 


where 


k=l 


ao  =  (7r/2)-i  / 

iQ  2  Tl  Jq 

If  If 

afc  =  —  /  fit)  cos  ktdt,  bk  =  —  f(t)  sin  ktdt 

^  Jq  ^  Jq 


(34) 

(35) 

(36) 


The  numbers  {uk}  and  {bk}  are  called  the  Fourier  cosine  and  sine  coefficients  of  /, 
respectively. 

Note  that  ifQ  =  (— tt,  tt)  and  /  is  an  even  function,  i.e.,  if  f{—x)  =  f{x),  the 

2  r 

ak  =  —  fit)  cos  ktdt,  bk  =  0. 

Jo 

Now,  if  /  is  an  odd  function,  i.e.,  if  /(— x)  =  —  fix),  then 

2  r 

Ofc  =  0,  bk  =  —  fit)  sin  ktdt. 

Jo 

Restricting  the  complex  part  we  obtain  the  Fourier  cosine  transform  (FCT),  by  using  the 
real  part  of  the  complex  kernel 


i?e[e™*]  =  cosiut)  =  -  . 


(37) 


So  the  Fourier  cosine  transform  of  real  or  complex  valued  function  fit),  which  is  de¬ 
fined  over  non-negative  values  with  non-negative  angular  frequency,  oj,  as 


F[f]iu)  =  /  fit)  cosutdt. 

Jo 


(38) 


Now  for  the  discrete  case,  a  particular  case  of  the  Fourier  transform  is  given  by 

N-l 


hn)  = 


i2'Kkn/N 


(39) 


k=0 


where  Ck  are  Fourier  coefficients  defined  as 


Af-l 


N 


—i27rkn/N 


(40) 


k=0 


This  is  called  the  Discrete  Fourier  transform  (DFT). 

The  multidimensional  transforms  are  a  simple  extension  case  of  the  on  dimensional. 
The  2-dimensional  DFT  is  defined  as 

jVl-l  -^2-1  /nifci  ,  n2k2\ 

f{x)  =  ^  ^  x{ni,n2)e  *  ^  (41) 

ni=0  k2=0 

ki  =  0, . . . ,  Ni  —  1,  k2  =  0, . . . ,  N2  —  1. 


For  the  discrete  case,  Discrete  Cosine  Transform  (DCT)  can  be  obtained  from  the 
DFT  At  this  point  we  have  to  point  out  that  /  must  be  an  even  function,  defined  over 
(— cx),  cx)),  symetrically.  There  are  four  types  of  DCT  First  defining  the  coefficients 
matrix  we  have  the  following  types. 


Type  I. 


Type  II. 


M,  =  I  ^ 


1/2 


kmkri  COS 


mniT 

N 


,  m,  n  =  0, 1, . . . ,  A^. 


Mn  =  I  ^ 


2\ 


1/2  r 


km  COS 


Type  III. 


M,„  =  (  ^ 


1/2  r 


k„  cos 


m{n  +  l/2)7r 
N 


[m  +  l/2)mi 
N 


m,  n  =  0, 1, . . . ,  —  1. 


m,  n  =  0, 1, . . . ,  A^  —  1. 


Type  IV. 


^ 


1/2 


k„  cos 


[m  +  l/2)(n  +  l/2)7r 
N 


,  m,  n  =  0, 1, . . . ,  A^  —  1. 


where 


r  1  if  j  ^  OorN 
“  1  75  tf  J  =  OorN. 

Type  II  is  the  most  frequently  used  in  many  applications. 


The  matrix  obtained  from  the  Type  II  is  shown  below. 


Note  that  this  matrix  is  an  orthogonal  matrix,  i.e.,  its  inverse  equals  its  transpose.  Name 
the  matrix  in  the  figure  above  T. 


[1,1 

0.35355339 

0.35355339 

0.35355339 

0.35355339 

0.35355339 

0.35355339 

0.35355339 

[2,1 

0.49039264 

0.41573481 

0.27778512 

0.09754516 

-0.09754516 

-0.27778512 

-0.41573481 

[3,3 

0.46193977 

0.19134172 

-0.19134172 

-0.46193977 

-0.46193977 

-0.19134172 

0.19134172 

[4,3 

0.41573481 

-0.09754516 

-0.49039264 

-0.27778512 

0.27778512 

0.49039264 

0.09754516 

[5,3 

0.35355339 

-0.35355339 

-0.35355339 

0.35355339 

0.35355339 

-0.35355339 

-0.35355339 

[6,3 

0.27778512 

-0.49039264 

0.09754516 

0.41573481 

-0.41573481 

-0.09754516 

0.49039264 

[7,3 

0.19134172 

-0.46193977 

0.46193977 

-0.19134172 

-0.19134172 

0.46193977 

-0.46193977 

[8,3 

0.09754516 

-0.27778512 

0.41573481 

-0.49039264 

0.49039264 

-0.41573481 

0.27778512 

0.35355339 

-0.49039264 

0.46193977 

-0.41573481 

0.35355339 

-0.27778512 

0.19134172 

-0.09754516 


Now  in  order  to  obtain  DCT  from  a  8  x  8  matrix,  M,  and  under  the  assumption  that 
the  range  of  M  is  appropriately  symmetrie  about  the  origin.  The  DCT  matrix  for  M  is 
given  by 

D  =  TMT' 

For  example,  given  a  pieture  and  extraeting  a  bloek  8  x  8,  we  un-normalized  so  that  this 
ean  bring  the  images  values  to  the  range  [0,256].  Onee  this  is  done,  the  resulting  un¬ 
normalized  must  be  level  off  by  128.  The  latter  provides  the  symmetry  required  upon 
the  images  values. 

These  values  are  ealled  the  un-quantized  DCT  eoeffieients  and  our  matrix  is  in  the  DCT 
domain. 

In  order  to  aehieve  the  DCT  matrix  over  the  image  let  us  define  the  following  matri- 
ees 


■  T 

0 

0 

...  O' 

■  T 

0 

0 

...  O' 

0 

T 

0 

...  0 

0 

T 

0 

...  0 

Ti  = 

0 

0 

T 

0 

and  Ty.  = 

0 

0 

T 

0 

.  0 

0 

0  T  _ 

/ 

.  0 

0 

0  T  _ 

I  ^  #  of  rows  onhe  image  ^  ^  #  number  of  columns  _  preoisoly  I  T  blooks  in 

the  diagonal  of  T/  and  r  T  blooks  in  the  diagonal  of  Tr.  Given  an  image  /  that  has  been 
un-normalized  and  level-off,  with  dimensions  mod  8  =  0,  the  matrix 

D  =  TiITl,  (42) 

is  the  DCT  matrix  of  the  image  to  whioh  the  DCT  method  has  been  applied  for  eaoh 
8x8  bloek. 

9.  R-Codes 

In  the  previous  seotion  we  oompared  an  original  image  with  different  modifioations,  by 
having  a  bit  switehed  over.  We  have  managed  to  aeeomplish  this  using  the  R-language 
(2.14,2.15).  We  have  loaded  paokages  jpeg,  Readimage,  boolfun,  RGraphios  and  many 
others.  The  two  orueial  paokages  were  Readimages  and  boolfun  to  use  read.jpeg  and 
toBin,  respeetively.  The  first  allows  us  to  read  an  image  in  jpg  or  jpeg  format.  The 


second  one  gives  an  integer  binary  representation.  Another  function  used  from  boolfun 
is  toint,  which  returns  an  integer  from  a  given  binary  representation. 

The  function  below,  imageManipulation2,  switches  the  bit  for  each  channel  in 
every  pixel  of  an  image  in  jpeg  or  jpg  format.  When  the  picture  is  read,  a  3-dimensional 
array  is  built.  Each  array  contains  the  numeric  representation  for  each  color  in  every 
pixel.  The  values  in  this  matrix  are  real  numbers  between  0  and  1 .  These  numbers  are 
normalized  and  they  are  of  the  form  n/255,  where  n  =  0, 1, ...,  255.  These  number  are 
multiplied  by  255  resulting  in  an  integer  ranging  from  0  to  255,  precisely  256  values. 

The  integers  are  converted  to  their  binary  representation.  A  particular  bit  is  switched 
over  for  each  binary.  This  sequence  is  converted  to  an  integer  and  normalized.  The 
resulting  matrix  is  an  image  different  from  the  original. 

%x  is  the  image 

%This  function  switches  the  bit  at  position  n 
%N  is  the  length  of  the  binary  representation 
%t  is  a  title 

%normalization  (division  by  255) 

function  (x,n,N,t) 

{ 

nrow  <-  dim(x)[l] 
ncol  <-  dim(x)[2] 
ncha<-  dim(x)[3] 
for(i  in  Imcha) 

{ 

for(j  in  l:nrow) 

{ 

for(k  in  Imcol) 

{ 

if(n<=N  &&  n>=l) 

{ 

m  =  x[j,k,i]*255 

%toBin  returns  a  binary  representation  of  length  N 
y  =  toBin(m,N) 
y[n]  =  !y[n] 

%toInt  returns  an  integer  from  a  binary  representation  of  a  certain  length 
x[i,k,i]  =  toInt(y)/255 

} 

} 

} 

} 

plot(x,main=t) 

X 

} 

>  EmbedMessagelmage 
function  (I,m) 


####################################### 

#EmbedMessageImage 

#This  function  embed  a  message  (m)  of 

#suitable  length  in  an  image  (I) 

#The  embedding  is  achieved  from  the 
#index  {(2,l),(2,nc)}  to 
#{(nr-l,l),(nr-l,nc)}  for  each  channel 
####################################### 

{ 

#The  message  m  is  splitted  in  an  array 
s  <-  unlist(strsplit(m,split=NULL)) 

#The  length  is  assigned  to  1 
1  <-  length(s) 

#A  copy  J  of  I  is  made 
J<-I 

#The  dimensions  are  assigned  to  nr  (rows),  nc  (cols),  nh  (channels) 
nr<-  dim(J)[l] 
nc  <-  dim(J)[2] 
nh  <-  dim(J)[3] 

#n  is  assigned  the  dimension  of  the  square  where  the  message  is  to  be  written 
n  <-  (nr-2)*nc 

#if  the  length  of  the  message  is  <=  the  total  dimension  of  the  image  (including  all  3 

channels) 

if(l<=n*nh) 

{ 

#k  is  assigned  the  reminder  n  =  1  mod  k 

k  <- 1  #f  is  assigned  the  minimum  number  of  channels  needed  to  embed  the  message 
f  <-  (l-k)/n+l 

#in  the  loop,  the  message  is  embeded  by  channel  and  message  segment 
for(i  in  l:f) 

{ 

#j  is  assigned  the  length  of  the  message  segment  to  be  embedded 
j  <-  n*(i<f)+k*(i==f) 

#mx  is  assigned  the  message  segment  in  a  string  of  characters,  not  an  array 
mx  <-  Paste(s[(n*(i-l)+l):(n*(i-l)+j)]) 

#embedding  per  channel  per  corresponding  message  segment 
J[2:(nr-l)„i]  <-  EmbedMessageMatrix(J[2:(nr-l)„i],mx) 

} 

} 

#once  the  message  has  been  embedded,  the  message  length  is  embedded  as  well 
EmbedMessageEength(J,l) 

} 

>  EmbedMessageMatrix 
function  (x,m) 

####################################### 


#EmbedMessageMatrix 
#This  function  embed  a  message  m,  in  a 
#matrix  with  dimensions  (r  x  c).  The 
#embedding  is  achieved  sequentially, 

#starting  at  (1,1)  in  the  matrix 
#(channel)  x. 

####################################### 

{ 

#The  message  is  eonverted  from  a  string  to  an  array  of  eharacters 
s  <-  charToInt(unlist(strsplit(m,split=NULL))) 

1  <-  length(s) 

#An  un-normalized  copy  of  x  is  made 
y  <-  x*255 
nr  <-  dim(y)[l] 
nc  <-  dim(y)[2] 

#if  the  length  of  the  message  is  suitable  we  proceed 
if(l<=nr*ne) 

{ 

#k  is  the  reminder  ne  =  1  mod  k 

k  <- 1  #f  is  the  minimum  number  of  rows  needed  to  embed  m 
f  <-  (l-k)/nc+l 

#The  message  is  embeded  row  by  row 
for(i  in  l:f) 

{ 

j  <-  k*(i==f)+nc*(i<f) 

n  <-  (i-l)*no 

y[i,l:j]  <- s[(n+l):(n+j)] 

} 

} 

#y  is  normalized 
y/255.0 
} 

>  EmbedMessageLength 
function  (I,n) 

####################################### 

#EmbedMessageEength 

#This  function  embed  the  length  of  the 

#message  in  the  image.  The  embedding 

#is  achieved  by  eonverting  the  length 

#into  a  eharacter  string,  "123456". 

#Then,  this  string  is  splitted  as  an 
#array  of  characters,  'T","2",...,"6". 

#The,  this  is  converted  in  integers 
#according  to  the  aseii  code.  The 
#resulting  numbers  are  written  orderly 


#on  the  channels’  comers,  accordingly 
#to  the  following  coordinates 
#(l,l),(l,nc),(nr,l),(nr,nc).  Here,  nr 
#is  the  number  of  rows  and  nc  is  the 
#number  of  columns. 

####################################### 

{ 

J  <- 1*255 
nr<-  dim(J)[l] 
nc  <-  dim(J)[2] 

s  <-  unlist(strsplit(as.character(n),split=NULL)) 

1  <-  length(s) 
s<-s[l:l] 
for(i  in  1:1) 

{ 

if(i==l) 

{ 

J[  1,1,1]  <-  charToInt(s[i]) 

} 

else  if(i==2) 

{ 

J[l,nc,l]  <-  charToInt(s[i]) 

} 

else  if(i==3) 

{ 

J[nr,l,l]  <-  charToInt(s[i]) 

} 

else  if(i==4) 

{ 

J[nr,nc,l]  <-  charToInt(s[i]) 

} 

else  if(i==5) 

{ 

J[l,l,2]  <-  charToInt(s[i]) 

} 

else  if(i==6) 

{  J[l,nc,2]  <-  charToInt(s[i]) 

} 

else  if(i==7) 

{  J[nr,l,2]  <-  charToInt(s[i]) 

} 

else  if(i==8) 

{  J[nr,nc,2]  <-  charToInt(s[i]) 

} 

else  if(i==9) 

{  J[l,l,3]  <-  charToInt(s[i]) 


} 

else  if(i==10) 

{  J[l,ne,3]  <-  charToInt(s[i]) 

} 

else  if(i==ll) 

{  J[nr,l,3]  <-  charToInt(s[i]) 

} 

else  if(i==12) 

{  J[nr,ne,3]  <-  charToInt(s[i]) 

} 

} 

if(l<12) 

{ 

for(k  in  (1+1):  12) 

{  if(k==l) 

{  J[l,l,l]  <-  eharToIntC’O") 

} 

else  if(k==2) 

{  J[l,ne,l]  <-  eharToInt("0") 

} 

else  if(k==3) 

{  J[nr,l,l]  <-  eharToInt("0") 

} 

else  if(k==4) 

{  J[nr,nc,l]  <-  eharToInt("0") 

} 

else  if(k==5) 

{  J[l,l,2]  <-  eharToIntC'O") 

} 

else  if(k==6) 

{  J[l,ne,2]  <-  eharToInt("0") 

} 

else  if(k==7) 

{  J[nr,l,2]  <-  eharToInt("0") 

} 

else  if(k==8) 

{  J[nr,nc,2]  <-  eharToInt("0") 

} 

else  if(k==9) 

{  J[l,l,3]  <-  eharToIntC’O") 

} 

else  if(k==10) 

{  J[l,ne,3]  <-  charToInt("0") 

} 

else  if(k==ll) 

{  J[nr,l,3]  <-  charToInt("0") 


} 

else  if(k==12) 

{  J[nr,ne,3]  <-  charToInt("0") 

} 

} 

} 

J/255.0 

} 

>  ExtraetMessagelmage 
funetion  (I) 

####################################### 

#ExtraetMessageImage 

#This  funetion  extraet  a  message  from 

#an  image.  It  is  assumed  that  the 

#message  is  embedded  aeeordingly  to  the 

#algorithm  employed  in 

#EmbedMessageImage. 

####################################### 

{ 

nr<-  dim(I)[l] 
ne  <-  dim(I)[2] 
nh  <-  dim(I)[3] 

1  <-  ExtraetMessageEength(I) 
n  <-  (nr-2)*ne 
k<-lf<-  (l-k)/n+l 
s  <- "" 
for(i  in  l:f) 

{ 

j  <-  n*(i<f)+k*(i==f) 

s  <-  pasteO(s,ExtraetMessageMatrix(I[2:(nr-l)„i],j)) 

} 

s 

} 

>  ExtraetMessageMatrix 
funetion  (x,l) 

####################################### 

#ExtraetMessageMatrix 

#This  funetion  extraet  a  message  from 

#a  matrix.  It  is  assumed  that  the 

#message  is  embedded  aeeordingly  to  the 

#algorithm  employed  EmbedMessageMatrix. 

####################################### 

{ 

Paste(intToChar(e(t(x)  )[1:1]*255)) 


} 


>  ExtractMessageLength 
function(I) 

####################################### 

#ExtractMessageLength 

#This  function  extract  the  length  from 

#an  image.  It  is  assumed  that  the 

#length  of  the  message  is  embedded 

#aceordingly  to  the  algorithm  employed 

#in  EmbedMessageEength. 

####################################### 

{ 

J  <- 1*255 
nr<-  dim(J)[l] 
nc  <-  dim(J)[2] 
s  <-  intToChar(J[  1,1,1]) 
s  <-  paste(intToChar(J[l,nc,l]),s,sep="") 
s  <-  paste(intToChar(J[nr,l,l]),s,sep="") 
s  <-  paste(intToChar(J[nr,ne,l]),s,sep="") 
s  <-  paste(intToChar(J[l,l,2]),s,sep="") 
s  <-  paste(intToChar(J[l,nc,2]),s,sep="") 
s  <-  paste(intToChar(J[nr,l,2]),s,sep="") 
s  <-  paste(intToChar(J[nr,nc,2]),s,sep="") 
s  <-  paste(intToChar(J[l,l,3]),s,sep="") 
s  <-  paste(intToChar(J[l,nc,3]),s,sep="") 
s  <-  paste(intToChar(J[nr,l,3]),s,sep="") 
s  <-  paste(intToChar(J[nr,ne,3]),s,sep="") 
unlist(as.numerie(s)) 

} 

>  EmbedMessageESB 
function  (I,m) 

########################################### 

#EmbedMessageESB 

#This  function  embed  a  message  in  an  image 
#using  the  ESB  teehnique. 

########################################### 

{ 

s  <-  binaryRep(m) 

1  <-  length(s) 
m<-  dim(I)[l] 
n  <-  dim(I)[2] 
if(l  <=  3*(m-2)*n) 

{ 

J  <-  EmbedMessageEength(I,l) 


k  <-  ceiling(l/3) 

J[2:(m-l)„l]  <-  EmbedMessageMatrixLSB(J[2:(m-l)„l],s[l:k]) 
if(k  <  1) 

{ 

J[2:(m-1)„2]  <-  EmbedMessageMatrixLSB(J[2:(m-l)„2],s[(k+l):(2*k)]) 
if(2*k  <  1) 

{ 

J[2:(m-1)„3]  <-  EmbedMessageMatrixESB(J[2:(m-l)„3],s[(2*k+l):l]) 

} 

} 

} 

else 

{ 

J  <-  "Error:  length  of  message  surpass  image  dimension 

} 

J 

} 

>  EmbedMessageMatrixESB 
function  (i,s) 

######################################### 

#EmbedMes  sageMatrixES  B 

#This  function  embed  a  message  into  a 

#channel. 

######################################### 

{ 

1  <-  length(s) 
m  <-  nrow(i) 
n  <-  ncol(i) 
k  <-  floor(m*n/l) 
ind  <-  seq(l,m*n,k)[l:l] 
jj  <-  c(t(i)) 

j  <-  ToBin(jj [ind] *255,8) 
j[,8]  <-  s 

jj[ind]  <-  Tolnt(j)/255.0 

JJ  <-  matrix(jj,ncol=n,nrow=m,byrow=TRUE) 

JJ 

} 

function  (I) 

########################################### 

#ExtractMess  ageES  B 

#This  function  extract  a  message  from  a 

#channel  using  the  ESB  technique. 

########################################### 

{ 


1  <-  ExtractMessageLength(I) 
m<-  dim(I)[l] 
n  <-  dim(I)[2] 
k  <-  ceiling(l/3) 
kl  <-  floor((m-2)*n/k) 
indl  <-  seq(l,(ni-2)*n,kl)[l:k] 

M  <-  ToBin(c(t(I[2:(m-l)„l]))[indl]*255,8)[,8] 
if(k  <  1) 

{ 

k2  <-  kl 
ind2  <-  indl 

M  <-  c(M,ToBin(c(t(I[2:(m-l)„2]))[ind2]*255,8)[,8]) 
if(2*k  <  1) 

{ 

k3  <-  floor((m-2)*n/(l-2*k)) 
ind3  <-  seq(l,(ni-2)*n,k3)[l:(l-2*k)] 

M  <-  c(M,ToBin(c(t(I[2:(m-l)„3]))[ind3]*255,8)[,8]) 

} 

} 

Paste(intToChar(ToInt(matrix(M,nrow  =  l/8,ncol=8,byrow=TRUE)))) 

} 

function  (m) 

######################################## 

#binaryRep 

#This  function  returns  a  binary  string 
#representing  the  message  m.  Eirst,  each 
#character  is  converted  to  an  integer 
#(ascii  code).  These  integers  are 
#converted  to  binary  strings  of  length  8 
#returning  a  string  of  all  binary  rep 
#in  an  array. 

######################################## 

{ 

c(t(ToB  in(charToInt(unlis  t(s  trsplit(m,  split=NUEE) ) ) ,  8 )) ) 

} 

>  Toint 
function  (M) 

######################################## 

#ToInt 

#ToInt  is  a  modified  version  of  toint.  This  accepts  arrays  of  binary  strings 
######################################## 

{ 

res  <-  c() 

for  (i  in  l:nrow(M)) 


{ 

res  <-  c(res,toInt(M[i,])) 

} 

res 

} 

>  ToBin 
funetion  (a,  n) 

######################################## 

#ToBin 

#ToBin  is  a  modified  version  of  toBin.  This  aceepts  arrays  of  binary  strings 
######################################## 

{ 

a  <-  as.integer(a) 
n  <-  as.integer(n) 
if(n  ==  0) 
res  <-  e() 
else  { 

res  <-  matrix(nrow=length(a),  ncol=n) 
for  (i  in  l:n) 

{ 

res[,i]  <-  aa  <-  a} 

} 

res 

} 


bc2p 

function  (y,d,mn,xl,yl,ty) 

{ 

nr  <-  dim(x)[l]/8 
nc  <-  dim(x)[2]/8 

V  <-  array(dim=c(nr,nc,3)) 

X  <-  DCTc(y) 

for(i  in  1  nr) 

{ 

for(j  in  l:nc) 

{ 

ri  <-  8*(i-l)+l 
rf<-8*i 
ci  <-  8*(j-l)+l 
cf<-8*j 

dctc8  <-  c(x[ri:rf,ci:cf|) 
m  <-  min(dctc8) 

M  <-  max(dctc8) 
kl  <-  floor((m+d/2)/d) 
k2  <-  ceiling((M-d/2)/d) 
cc  <-  c() 
for(l  inkl:k2) 

{ 

cc  <-  c(cc,length(chi(dctc8,l,d))) 

} 

V  <-  Chi2Prob(cc) 
for(k  in  1:3) 

{ 

V[i,j,k]  <-v[k] 

} 

} 

} 

plot(V[„3][V[„3]!="NaN"],type=ty,main=nm,xlab=xl,ylab=yl) 

} 

bc2ptest 
function  (y,d) 

{ 

nr  <-  dim(y)[l]/8 
nc  <-  dim(y)[2]/8 

V  <-  array(dim=c(nr,nc,3)) 

X  <-  DCTc(y) 

for(i  in  1  nr) 

{ 

for(j  in  l:nc) 

{ 

ri  <-  8*(i-l)+l 
rf<-8*i 
ci  <-  8*(j-l)+l 
cf<-8*j 
if(d!=0) 

{ 

kl  <-  freqlowerbound(min(x[ri:rf,ci:cf|),d) 
k2  <-  frequpperbound(max(x[ri:rf,ci:cf|),d) 


} 

cc  <-  c() 
for(l  inkl:k2) 

{ 

cc  <-  c(cc,length(chi(x[ri:rf,ci:cf|4,d))) 

} 

V[i,j,l:3]  <-  Chi2Prob(cc) 

} 

} 

V 

} 

bc2ptest2 
function  (y,d) 

{ 

nr  <-  dim(y)[l]/8 
nc  <-  dim(y)[2]/8 

V  <-  array(dim=c(nr,nc,3)) 

X  <-  DCTc(y) 

cc  <-  c() 
for(i  in  1  nr) 

{ 

for(j  in  l:nc) 

{ 

ri  <-  8*(i-l)+l 
rf<-8*i 
ci  <-  8*(j-l)+l 
cf<-8*j 
if(d!=0) 

{ 

kl  <-  freqlowerbound(x[ri:rf,ci:cf|,d)[2] 
k2  <- frequpperbound(x[ri  rf,ci:cf|,d)[2] 

} 

for(l  in  kl:k2) 

{ 

cc  <-  c(cc,lengtb(cbi(x[ri:rf,ci:cf|4,d))) 

} 

} 

} 

cc 

} 

binaryRep 
function  (m) 

######################################## 

#binaryRep 

#Tbis  function  returns  a  binary  string 
#representing  Ibe  message  m.  First,  each 
#cbaracter  is  converted  to  an  integer 
#(ascii  code).  These  integers  are 
#convertedto  binary  strings  of  length  8 
#returning  a  string  of  all  binary  rep 
#in  an  array. 

######################################## 


{ 

c(t(ToBin(charToInt(unlist(strsplit(m^plit=NULL))),8))) 

} 

BlockCWlProb 
function  (x) 

################################################# 

#BlockChi2Prob 

#BlockCbi2Prob  returns  tbe  probability  of 
#embedding  for  each  block 

################################################# 

{ 

nr  <-  dim(x)[l]/8 
nc  <-  dim(x)[2]/8 

V  <-  array(dim=c(nr,nc,3)) 

dt  <-  DCTc(x) 
for(i  in  f  nr) 

{ 

for(j  in  l:nc) 

{ 

ri  <-  8*(i-l)+l 
rf<-8*i 
ci  <-  8*(j-l)+l 
cf<-8*j 

v<-  Chi2Prob(table(dt[ri:rf,ci:cf|)) 
for(k  in  1:3) 

{ 

V[i,j,k]  <-v[k] 

} 

} 

} 

V 

} 


BlockChi2Prob2 
function  (x) 

{ 

nr  <-  dim(x)[l]/8 
nc  <-  dim(x)[2]/8 
V  <-  array(dim=c(nr,nc,3)) 

for(i  in  1  nr) 

{ 

for(j  in  l:nc) 

{ 

ri  <-  8*(i-l)+l 
rf<-8*i 
ci  <-  8*(j-l)+l 
cf<-8*j 

v<-  Chi2Prob2(x[ri:rf,ci:cf|*255) 
for(k  in  1:3) 

{ 

V[i,j,k]  <-v[k] 

} 

} 


} 

V 

} 

BlockCWlValue 
function  (x) 

{ 

nr  <-  dim(x)[l]/8 
nc  <-  dim(x)[2]/8 

chisqvalues  <-  matrix(nrow=nr,ncol=nc) 
for(i  in  1  nr) 

{ 

for(j  in  l:nc) 

{ 

ri  <-  8*(i-l)+l 
rf<-8*i 
ci  <-  8*(j-l)+l 
cf<-8*j 

chisqvalues [i,j]  <-  Chi2Value(x[ri:rf,ci:cf|*255) 

} 

} 

chisqvalues 

} 

BloekChi2Value2 
function  (x) 

{ 

nr  <-  dim(x)[l]/8 
nc  <-  dim(x)[2]/8 

chisqvalues  <-  matrix(nrow=nr,ncol=nc) 
for(i  in  f  nr) 

{ 

for(j  in  l:nc) 

{ 

ri  <-  8*(i-l)+l 
rf<-8*i 
ci  <-  8*(j-l)+l 
cf<-8*j 

chisqvalues [i,j]  <-  Chi2Value2(x[ri:rf,ci:cf|*255) 

} 

} 

chisqvalues 
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Abstract 

In  this  ongoing  research  we  study  the  problem  of 
detecting  the  existence  of  a  hidden  message  in  an 
image.  JPEG  is  the  image  format  to  work  with, 
however,  the  study  shall  not  be  limited  to  this  type. 
Many  techniques  for  detecting  embedded  messages 
have  been  proposed.  First,  we  study  different 
mechanisms  of  embedding  and  how  these  affected 
randomness.  We  implemented  different  programs 
developed  in  the  R-language.  Methods  of  embedding 
were  considered  from  very  simple  to  more  complex 
using  the  LSB  technique,  spreading  and  an  equal 
distribution  of  a  message  between  channels.  We 
study  two  statistical  approaches  for  the  steganalysis. 
First,  we  study  the  ')^  statistical  test  which  works  on 
the  concept  adjacent  DCT-coefficients  frequencies. 
Second,  we  also  study  the  likelihood  ratio  test  (LRT) 
for  the  implementation  of  the  Quantization  Index 
Modulation  (QIM)  and  the  Dither  Modulation  (DM) 
to  study  the  Least  Significant  Bits  (LSB)  embedding 
technique.  The  LSB  is  the  focus  of  our  study  since 
their  modification  cannot  be  perceived  by  the  human 
eye,  i.e.  its  modification  raises  no  significant 
attention.  We  calculate  DCT-coefficients  matrices 
using  block  diagonal  matrices  to  overcome  the  use  of 
loops.  Some  codes  were  written  using  R-language  to 
study  and  obtain  some  results  to  analyze  DCT 
coefficients  frequencies.  Even  more,  we  use  the 
concept  of  the  QIM  for  the  quantized  values  obtained 
by  grouping  the  DCT  coefficients. 

Keywords:  Steganography,  steganalysis,  LSB,  DCT, 
QIM,  Shannon’s  entropy,  R  language,  x^"test 

Introduction 

The  object  of  steganography  is  to  hide  the  fact  that 
some  sort  of  communication  is  taking  place.  This  is 
the  ground  for  the  use  of  the  Least  Significant  Bit. 
The  modification  of  a  bit  in  a  pixel  will  have  an 
impact  on  the  level  of  visual  perception.  As  we  move 
along  the  binary  representation  from  the  least 
significant  to  the  more  significant  visually  we  will  be 
able  to  perceive  that  a  change  has  taken  place.  For 
example.  Figure  2  is  the  result  of  changing  the  8*^  bit 
in  the  binary  representation  for  each  color  from 
Figure  1.  Note  that  each  pixel  is  represented  by  a 
string  of  8  bits  in  a  monochi'ome  picture  and  24  in  a 
RGB  picture.  From  black  to  white,  we  have  'f'  =  256 
different  tones  of  gray.  In  a  RGB  image  we  have  256^ 
=  16,777,216  (256  for  each  color). 

Now,  any  visual  insight  of  the  existence  of  a 
modification  leads  steganography  failure. 


In  reality,  no  such  modification  shall  be  perceptible 
by  the  human  eye. 


Figure  1 .  Original 


Figure  2.  Modification  of  the  8'^  bit  per  pixel 


Methodology 

In  order  to  achieve  images  modification  we 
construct  our  own  codes.  Below  a  code  written  in  R- 
language  uses  the  LSB  modification  technique. 

function  (I,m){ 
s  <-  binaryRep(m) 

1  <-  length(s) 
ra  <-  dim(I)[l] 
n  <-  dim(I)[2] 
ch  <-  dim{I)[3] 
z  <-  ch*(m-2)*n 

J  <-  EmbedMessageLength(I,l) 
k  <-  ceiling(l/ch) 

J[2  (m-l)„l]  <-  EmbedMessageMatrixLSB(J[2:(m-l)„l],s[l:k]) 
if(k<l){ 

J[2:(m-1)„2] 

<-  EmbedMessageMatrixLSB(J[2:(ra-l)„2],s[{k+l):(2*k)]) 
if((2*k)  <  1)  { 

J[2:{ni-1)„3] 

<-  EmbedMessageMatrixLSB(J[2:(m-l)„3],s[(2*k+l)  1]) 

} 

} 

} 

else{ 

J  <-  "Error:  length  of  message  surpass  image  dimension" 

) 

return  J 
} 


There  are  many  approaches  being  used  to  detect 
stego  images.  Discrete  Cosine  Transforms  play  a 
crucial  role  in  many  of  these  techniques.  The  DCT 
matrix  is  a  special  case  of  the  Discrete  Fourier 
Transforms.  A  version  of  the  DCT  used  in  our 
research  is 

Tij  =  ^C(!)C(/)Sx=o  rrixydn (x,  i)dn(y,j) 

where 


C(n)  =  (v^ 


n  =  0 


J  /•  A  r(2a+l)b7rl 

i)  =  cos  [ — — — J 


1  if  n>  0 
and  rrixy  are  the  entries  from  a  matrix  M,  and  T  = 
(Tij^ij,  an  orthogonal  matrix.  We  construct  the  DCT 
coefficients  matrix  for  D^y  by  taking  the  matrix 
product  TMyyT'.  The  matrix  Mxy  is  being  transformed 
previously  to  the  DCT  domain,  (symmetric  about 
0).  We  also  construct  block  diagonal  matrices  to 
obtain  the  image  DCT  coefficients.  These  block 
diagonal  matrices  are  of  the  form  (d  =  dimension 
squared) 

rT  0  0  0- 

0  r  0  ■■■  0 

Fd  =  0  0  r  0 


LO  0  0  ■■■ 

so  that  a  particular  form  of  DCT  coefficient  image  is 
obtained  by  D  =  TiMT^,  where  D  =  {Djcyjx^.  and  M  = 


One  approach  is  that  the  frequency  of  the  DCT 
coefficients  is  being  affected  by  the  presence  of  an 
encrypted  hidden  message.  Given  and  c-  the 
frequencies  of  the  color  indices  previous  and  after 
the  embedding,  respectively  can  be  described  by 

l^2t  “  ^2i  +  ll  —  l^2t  “  ^2t+ll- 
By  considering  the  DCT  coefficients  frequencies 


x  =  ri 


v+1  [yj-yj] 


yi  =■ 


-  and  yi  =  n2i, 


y'i  2 
rii  the  frequencies  of  the  DCT  coefficients  in  a 
region. 


Given  that  jr  approximate  to  a  we  may  obtain 

the  probability  of  embedding  p  =  P{x^  >  x).  Figure 
3  shows  the  probabilities  of  embedding  for  different 
regions  (8x8  sequential  blocks)  using  the  image  in 
Figure  1  to  embed  6Napolen.txt  from 
httr)://www.textfiles.com/stories/. 


To  detect  stego  images  we  may  use  the  (QIM). 
Given  a  host  signal  x,  with  pmf  Px,  message  m  and  a 
mapping  5(x,m),  an  ensemble  of  discontinuous 
functions  have  the  strength  of  embedding  by 
minimizing  the  distortion,  D(s,x)  =  ^||s  —  x|p. 


•P 


Figure  3.  Probabilities  per  Sequential  8x8  Blocks 

These  discontinuities  allow  us  to  define  the  quantized  values  for  a 
given  A*>  0.  Xj^  =  kA*,  where  k  is  an  integer.  Let  us  define  set 
Aa*  =  {kA*:  k  is  an  integer]  and  x(.^>  A*)  =  [a  —  y,  a  +  yj. 
The  pmf  of  the  quantized  cover,  Xq  €  A^*,  without  embedding  is 
given  by  Px^io.)  =  S;cejCa,A*)  ^x(^)-  Analogously,  the  pmf  of  the 
quantized  embedding,  6  A^, 
where  A=  2A*,  a  G  A^. 

Conclusions 

These  probabilities  lead  us  to  the  Ratio  Likelihood  Test  and  to  the 
Shannon  Entropy.  Given  pi,  (qi,  =1-  ph),  the  probability  of 
embedding  per  block,  b  €  B  (the  space  of  blocks),  and  assuming  a 
uniform  distribution,  i.e.,  P{b)  =  p  forb  G  B,  the  entropy  may  be 
obtain  by 

H(/)  ^-IbEslPb  PlOg{Pi,p}]  -  Zi,EB[?fiPlOg  {<?6P}]- 
The  latter  result  is  to  be  studied. 
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